Automated matching of data mining dataset schemata to background knowledge
نویسندگان
چکیده
Interoperability in data mining is supported by a standard for dataset and model representation: Predictive Model Markup Language (PMML).1 It allows to describe the columns (which are continuous, categorical or ordinal) of the source data table, preprocessing transformations (such as discretization of continuous values) as well as the structure of the discovered model (e.g. neural network or set of association rules). In addition to source data, the input to the mining task typically includes expertprovided background knowledge. It may be related, for example, to standard ways of discretizing numerical quantities (e.g. boundaries between ‘normal blood pressure’ and ‘hypertension’, which help intuitive reading of discovered hypotheses), or may itself have the form of predictive models to which the discovered models can be compared during or after the mining process. The proposal for Background Knowledge Exchange Format (BKEF) [?], in many aspects similar to PMML, aims to support interoperability of data mining (and related) applications dealing with background knowledge. Case studies [?] showed that one BKEF model typically has to be aligned with different PMML models (from different mining sessions in the same domain). The alignments are stored in the Field Mapping Language (FML) [?], expressing that a data field (column) in a PMML model semantically corresponds to an abstract ‘field’ in a BKEF model. However, writing FML alignments (analogous to instances of the Alignment Format [?] used in ontology matching) by hand is tedious and recognizing suitable correspondences may be hard; partial automation is thus desirable. Furthermore, existing tools for ontology/schema matching are not straightforwardly usable, since PMML (and even BKEF) are, compared to ontologies, more biased by data structures, but BKEF is more abstract and weakly structured than database schemata. Therefore, specific methods (inspired by existing ones) and a new tool have been devised.
منابع مشابه
A Data Mining approach for forecasting failure root causes: A case study in an Automated Teller Machine (ATM) manufacturing company
Based on the findings of Massachusetts Institute of Technology, organizations’ data double every five years. However, the rate of using data is 0.3. Nowadays, data mining tools have greatly facilitated the process of knowledge extraction from a welter of data. This paper presents a hybrid model using data gathered from an ATM manufacturing company. The steps of the research are based on CRISP-D...
متن کاملAutomated detection of coronavirus disease (COVID-19) by using data-mining techniques: a brief report
Background: The clinical field has vast sick data that has not been analyzed. Discovering a way to analyze this raw data and turn it into an information treasure can save many lives. Using data mining methods is an efficient way to analyze this large amount of raw data. It can predict the future with accurate knowledge of the past, providing new insights into disease diagnosis and prevention. S...
متن کاملMINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS
This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...
متن کاملAutomated Ontology Creation using XML Schema Elements
Ontologies are commonly used to represent formal semantics in a computer system, usually capturing them in the form of concepts, relationships and axioms. Axioms convey asserted knowledge and support inferring new knowledge through logical reasoning. For complex systems, the process of creating ontologies manually can be tedious and error-prone. Many automated methods of knowledge discovery are...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کامل